Overview

Dataset statistics

Number of variables42
Number of observations59400
Missing cells46094
Missing cells (%)1.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory19.5 MiB
Average record size in memory344.0 B

Variable types

Numeric10
Categorical30
Boolean2

Alerts

recorded_by has constant value "GeoData Consultants Ltd" Constant
date_recorded has a high cardinality: 356 distinct values High cardinality
funder has a high cardinality: 1897 distinct values High cardinality
installer has a high cardinality: 2145 distinct values High cardinality
wpt_name has a high cardinality: 37400 distinct values High cardinality
subvillage has a high cardinality: 19287 distinct values High cardinality
lga has a high cardinality: 125 distinct values High cardinality
ward has a high cardinality: 2092 distinct values High cardinality
scheme_name has a high cardinality: 2696 distinct values High cardinality
gps_height is highly correlated with population and 1 other fieldsHigh correlation
population is highly correlated with gps_height and 1 other fieldsHigh correlation
construction_year is highly correlated with gps_height and 1 other fieldsHigh correlation
gps_height is highly correlated with construction_yearHigh correlation
region_code is highly correlated with district_codeHigh correlation
district_code is highly correlated with region_codeHigh correlation
construction_year is highly correlated with gps_heightHigh correlation
population is highly correlated with construction_yearHigh correlation
construction_year is highly correlated with populationHigh correlation
quantity_group is highly correlated with recorded_by and 1 other fieldsHigh correlation
scheme_management is highly correlated with recorded_by and 2 other fieldsHigh correlation
payment_type is highly correlated with recorded_by and 1 other fieldsHigh correlation
waterpoint_type_group is highly correlated with extraction_type_group and 4 other fieldsHigh correlation
source_class is highly correlated with recorded_by and 2 other fieldsHigh correlation
water_quality is highly correlated with recorded_by and 1 other fieldsHigh correlation
public_meeting is highly correlated with recorded_byHigh correlation
extraction_type_group is highly correlated with waterpoint_type_group and 4 other fieldsHigh correlation
basin is highly correlated with region and 1 other fieldsHigh correlation
region is highly correlated with basin and 1 other fieldsHigh correlation
status_group is highly correlated with recorded_byHigh correlation
recorded_by is highly correlated with quantity_group and 22 other fieldsHigh correlation
management_group is highly correlated with scheme_management and 2 other fieldsHigh correlation
source is highly correlated with source_class and 2 other fieldsHigh correlation
near_river is highly correlated with recorded_byHigh correlation
management is highly correlated with scheme_management and 2 other fieldsHigh correlation
quality_group is highly correlated with water_quality and 1 other fieldsHigh correlation
waterpoint_type is highly correlated with waterpoint_type_group and 4 other fieldsHigh correlation
permit is highly correlated with recorded_byHigh correlation
source_type is highly correlated with source_class and 2 other fieldsHigh correlation
extraction_type is highly correlated with waterpoint_type_group and 4 other fieldsHigh correlation
quantity is highly correlated with quantity_group and 1 other fieldsHigh correlation
payment is highly correlated with payment_type and 1 other fieldsHigh correlation
extraction_type_class is highly correlated with waterpoint_type_group and 4 other fieldsHigh correlation
gps_height is highly correlated with latitude and 3 other fieldsHigh correlation
longitude is highly correlated with latitude and 5 other fieldsHigh correlation
latitude is highly correlated with gps_height and 6 other fieldsHigh correlation
basin is highly correlated with gps_height and 7 other fieldsHigh correlation
region is highly correlated with gps_height and 17 other fieldsHigh correlation
region_code is highly correlated with longitude and 4 other fieldsHigh correlation
district_code is highly correlated with latitude and 2 other fieldsHigh correlation
scheme_management is highly correlated with longitude and 4 other fieldsHigh correlation
construction_year is highly correlated with gps_height and 3 other fieldsHigh correlation
extraction_type is highly correlated with basin and 8 other fieldsHigh correlation
extraction_type_group is highly correlated with region and 6 other fieldsHigh correlation
extraction_type_class is highly correlated with region and 8 other fieldsHigh correlation
management is highly correlated with region and 2 other fieldsHigh correlation
management_group is highly correlated with scheme_management and 3 other fieldsHigh correlation
payment is highly correlated with region and 2 other fieldsHigh correlation
payment_type is highly correlated with region and 2 other fieldsHigh correlation
water_quality is highly correlated with quality_groupHigh correlation
quality_group is highly correlated with water_qualityHigh correlation
quantity is highly correlated with management_group and 1 other fieldsHigh correlation
quantity_group is highly correlated with management_group and 1 other fieldsHigh correlation
source is highly correlated with latitude and 8 other fieldsHigh correlation
source_type is highly correlated with region and 7 other fieldsHigh correlation
source_class is highly correlated with extraction_type and 3 other fieldsHigh correlation
waterpoint_type is highly correlated with region and 6 other fieldsHigh correlation
waterpoint_type_group is highly correlated with region and 7 other fieldsHigh correlation
funder has 3635 (6.1%) missing values Missing
installer has 3655 (6.2%) missing values Missing
public_meeting has 3334 (5.6%) missing values Missing
scheme_management has 3877 (6.5%) missing values Missing
scheme_name has 28166 (47.4%) missing values Missing
permit has 3056 (5.1%) missing values Missing
amount_tsh is highly skewed (γ1 = 57.80779995) Skewed
num_private is highly skewed (γ1 = 91.93374999) Skewed
id is uniformly distributed Uniform
id has unique values Unique
amount_tsh has 41639 (70.1%) zeros Zeros
gps_height has 20438 (34.4%) zeros Zeros
longitude has 1812 (3.1%) zeros Zeros
num_private has 58643 (98.7%) zeros Zeros
population has 21381 (36.0%) zeros Zeros
construction_year has 20709 (34.9%) zeros Zeros

Reproduction

Analysis started2021-11-01 17:26:00.828336
Analysis finished2021-11-01 17:26:35.361787
Duration34.53 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

id
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct59400
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37115.13177
Minimum0
Maximum74247
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size928.1 KiB
2021-11-01T12:26:35.459203image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3730.9
Q118519.75
median37061.5
Q355656.5
95-th percentile70564.05
Maximum74247
Range74247
Interquartile range (IQR)37136.75

Descriptive statistics

Standard deviation21453.12837
Coefficient of variation (CV)0.5780156866
Kurtosis-1.201515029
Mean37115.13177
Median Absolute Deviation (MAD)18568.5
Skewness0.00262253035
Sum2204638827
Variance460236716.9
MonotonicityNot monotonic
2021-11-01T12:26:35.581418image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20471
 
< 0.1%
723101
 
< 0.1%
498051
 
< 0.1%
518521
 
< 0.1%
620911
 
< 0.1%
641381
 
< 0.1%
579931
 
< 0.1%
600401
 
< 0.1%
334131
 
< 0.1%
354601
 
< 0.1%
Other values (59390)59390
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
742471
< 0.1%
742461
< 0.1%
742431
< 0.1%
742421
< 0.1%
742401
< 0.1%
742391
< 0.1%
742381
< 0.1%
742371
< 0.1%
742361
< 0.1%
742351
< 0.1%

amount_tsh
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct98
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean317.6503847
Minimum0
Maximum350000
Zeros41639
Zeros (%)70.1%
Negative0
Negative (%)0.0%
Memory size928.1 KiB
2021-11-01T12:26:35.711117image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q320
95-th percentile1200
Maximum350000
Range350000
Interquartile range (IQR)20

Descriptive statistics

Standard deviation2997.574558
Coefficient of variation (CV)9.436709989
Kurtosis4903.543102
Mean317.6503847
Median Absolute Deviation (MAD)0
Skewness57.80779995
Sum18868432.85
Variance8985453.232
MonotonicityNot monotonic
2021-11-01T12:26:35.831173image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
041639
70.1%
5003102
 
5.2%
502472
 
4.2%
10001488
 
2.5%
201463
 
2.5%
2001220
 
2.1%
100816
 
1.4%
10806
 
1.4%
30743
 
1.3%
2000704
 
1.2%
Other values (88)4947
 
8.3%
ValueCountFrequency (%)
041639
70.1%
0.23
 
< 0.1%
0.251
 
< 0.1%
13
 
< 0.1%
213
 
< 0.1%
5376
 
0.6%
6190
 
0.3%
769
 
0.1%
91
 
< 0.1%
10806
 
1.4%
ValueCountFrequency (%)
3500001
 
< 0.1%
2500001
 
< 0.1%
2000001
 
< 0.1%
1700001
 
< 0.1%
1380001
 
< 0.1%
1200001
 
< 0.1%
1170007
< 0.1%
1000003
< 0.1%
700001
 
< 0.1%
600001
 
< 0.1%

date_recorded
Categorical

HIGH CARDINALITY

Distinct356
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
2011-03-15
 
572
2011-03-17
 
558
2013-02-03
 
546
2011-03-14
 
520
2011-03-16
 
513
Other values (351)
56691 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique35 ?
Unique (%)0.1%

Sample

1st row2011-03-14
2nd row2013-03-06
3rd row2013-02-25
4th row2013-01-28
5th row2011-07-13

Common Values

ValueCountFrequency (%)
2011-03-15572
 
1.0%
2011-03-17558
 
0.9%
2013-02-03546
 
0.9%
2011-03-14520
 
0.9%
2011-03-16513
 
0.9%
2011-03-18497
 
0.8%
2011-03-19466
 
0.8%
2013-02-04464
 
0.8%
2013-01-29459
 
0.8%
2011-03-04458
 
0.8%
Other values (346)54347
91.5%

Length

2021-11-01T12:26:35.954193image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2011-03-15572
 
1.0%
2011-03-17558
 
0.9%
2013-02-03546
 
0.9%
2011-03-14520
 
0.9%
2011-03-16513
 
0.9%
2011-03-18497
 
0.8%
2011-03-19466
 
0.8%
2013-02-04464
 
0.8%
2013-01-29459
 
0.8%
2011-03-04458
 
0.8%
Other values (346)54347
91.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

funder
Categorical

HIGH CARDINALITY
MISSING

Distinct1897
Distinct (%)3.4%
Missing3635
Missing (%)6.1%
Memory size928.1 KiB
Government Of Tanzania
9084 
Danida
 
3114
Hesawa
 
2202
Rwssp
 
1374
World Bank
 
1349
Other values (1892)
38642 

Length

Max length30
Median length6
Mean length9.929902268
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique974 ?
Unique (%)1.7%

Sample

1st rowRoman
2nd rowGrumeti
3rd rowLottery Club
4th rowUnicef
5th rowAction In A

Common Values

ValueCountFrequency (%)
Government Of Tanzania9084
 
15.3%
Danida3114
 
5.2%
Hesawa2202
 
3.7%
Rwssp1374
 
2.3%
World Bank1349
 
2.3%
Kkkt1287
 
2.2%
World Vision1246
 
2.1%
Unicef1057
 
1.8%
Tasaf877
 
1.5%
District Council843
 
1.4%
Other values (1887)33332
56.1%
(Missing)3635
 
6.1%

Length

2021-11-01T12:26:36.064075image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
of9748
 
10.8%
government9276
 
10.3%
tanzania9172
 
10.1%
danida3123
 
3.5%
world2789
 
3.1%
water2645
 
2.9%
hesawa2203
 
2.4%
bank1416
 
1.6%
rwssp1376
 
1.5%
kkkt1370
 
1.5%
Other values (2065)47254
52.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

gps_height
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct2428
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean668.2972391
Minimum-90
Maximum2770
Zeros20438
Zeros (%)34.4%
Negative1496
Negative (%)2.5%
Memory size928.1 KiB
2021-11-01T12:26:36.176961image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-90
5-th percentile0
Q10
median369
Q31319.25
95-th percentile1797
Maximum2770
Range2860
Interquartile range (IQR)1319.25

Descriptive statistics

Standard deviation693.1163503
Coefficient of variation (CV)1.037137833
Kurtosis-1.292440135
Mean668.2972391
Median Absolute Deviation (MAD)369
Skewness0.462402085
Sum39696856
Variance480410.2751
MonotonicityNot monotonic
2021-11-01T12:26:36.288471image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
020438
34.4%
-1560
 
0.1%
-1655
 
0.1%
-1355
 
0.1%
-2052
 
0.1%
129052
 
0.1%
-1451
 
0.1%
30351
 
0.1%
-1849
 
0.1%
-1947
 
0.1%
Other values (2418)38490
64.8%
ValueCountFrequency (%)
-901
 
< 0.1%
-632
 
< 0.1%
-591
 
< 0.1%
-571
 
< 0.1%
-551
 
< 0.1%
-541
 
< 0.1%
-531
 
< 0.1%
-522
 
< 0.1%
-512
 
< 0.1%
-505
< 0.1%
ValueCountFrequency (%)
27701
< 0.1%
26281
< 0.1%
26271
< 0.1%
26262
< 0.1%
26231
< 0.1%
26141
< 0.1%
25851
< 0.1%
25761
< 0.1%
25691
< 0.1%
25681
< 0.1%

installer
Categorical

HIGH CARDINALITY
MISSING

Distinct2145
Distinct (%)3.8%
Missing3655
Missing (%)6.2%
Memory size928.1 KiB
DWE
17402 
Government
 
1825
RWE
 
1206
Commu
 
1060
DANIDA
 
1050
Other values (2140)
33202 

Length

Max length30
Median length4
Mean length6.111202798
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1098 ?
Unique (%)2.0%

Sample

1st rowRoman
2nd rowGRUMETI
3rd rowWorld vision
4th rowUNICEF
5th rowArtisan

Common Values

ValueCountFrequency (%)
DWE17402
29.3%
Government1825
 
3.1%
RWE1206
 
2.0%
Commu1060
 
1.8%
DANIDA1050
 
1.8%
KKKT898
 
1.5%
Hesawa840
 
1.4%
0777
 
1.3%
TCRS707
 
1.2%
Central government622
 
1.0%
Other values (2135)29358
49.4%
(Missing)3655
 
6.2%

Length

2021-11-01T12:26:36.413540image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
dwe17601
25.8%
government2778
 
4.1%
water1881
 
2.8%
hesawa1395
 
2.0%
rwe1230
 
1.8%
district1216
 
1.8%
kkkt1153
 
1.7%
council1106
 
1.6%
commu1065
 
1.6%
danida1051
 
1.5%
Other values (1976)37806
55.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

longitude
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct57516
Distinct (%)96.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34.07742669
Minimum0
Maximum40.34519307
Zeros1812
Zeros (%)3.1%
Negative0
Negative (%)0.0%
Memory size928.1 KiB
2021-11-01T12:26:36.534050image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile30.04066001
Q133.09034738
median34.90874343
Q337.17838657
95-th percentile39.13323954
Maximum40.34519307
Range40.34519307
Interquartile range (IQR)4.08803919

Descriptive statistics

Standard deviation6.567431846
Coefficient of variation (CV)0.1927208854
Kurtosis19.18703105
Mean34.07742669
Median Absolute Deviation (MAD)2.032511095
Skewness-4.191046455
Sum2024199.146
Variance43.13116105
MonotonicityNot monotonic
2021-11-01T12:26:36.660017image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01812
 
3.1%
37.540900642
 
< 0.1%
33.010509772
 
< 0.1%
39.093483892
 
< 0.1%
32.97271872
 
< 0.1%
33.006275482
 
< 0.1%
39.103950182
 
< 0.1%
37.542784972
 
< 0.1%
36.802489882
 
< 0.1%
39.098373982
 
< 0.1%
Other values (57506)57570
96.9%
ValueCountFrequency (%)
01812
3.1%
29.60712191
 
< 0.1%
29.607201091
 
< 0.1%
29.610320561
 
< 0.1%
29.610964821
 
< 0.1%
29.611946741
 
< 0.1%
29.612506891
 
< 0.1%
29.612762961
 
< 0.1%
29.613443091
 
< 0.1%
29.61687181
 
< 0.1%
ValueCountFrequency (%)
40.345193071
< 0.1%
40.344300891
< 0.1%
40.325239961
< 0.1%
40.325226431
< 0.1%
40.323401811
< 0.1%
40.322832371
< 0.1%
40.322804531
< 0.1%
40.32262511
< 0.1%
40.322169021
< 0.1%
40.321965931
< 0.1%

latitude
Real number (ℝ)

HIGH CORRELATION

Distinct57517
Distinct (%)96.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-5.70603266
Minimum-11.64944018
Maximum-2 × 10-8
Zeros0
Zeros (%)0.0%
Negative59400
Negative (%)100.0%
Memory size928.1 KiB
2021-11-01T12:26:36.787684image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-11.64944018
5-th percentile-10.58554992
Q1-8.540621305
median-5.02159665
Q3-3.32615564
95-th percentile-1.408872227
Maximum-2 × 10-8
Range11.64944016
Interquartile range (IQR)5.214465665

Descriptive statistics

Standard deviation2.946019081
Coefficient of variation (CV)-0.5162990219
Kurtosis-1.057616666
Mean-5.70603266
Median Absolute Deviation (MAD)2.07002988
Skewness-0.1520365709
Sum-338938.34
Variance8.679028427
MonotonicityNot monotonic
2021-11-01T12:26:36.910153image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-2 × 10-81812
 
3.1%
-6.985841732
 
< 0.1%
-3.797578612
 
< 0.1%
-6.981884192
 
< 0.1%
-7.104625032
 
< 0.1%
-7.056922532
 
< 0.1%
-7.175174432
 
< 0.1%
-6.990730942
 
< 0.1%
-6.97875552
 
< 0.1%
-6.994704012
 
< 0.1%
Other values (57507)57570
96.9%
ValueCountFrequency (%)
-11.649440181
< 0.1%
-11.648377591
< 0.1%
-11.586296561
< 0.1%
-11.568576791
< 0.1%
-11.566804571
< 0.1%
-11.564508651
< 0.1%
-11.564323571
< 0.1%
-11.562315921
< 0.1%
-11.562288981
< 0.1%
-11.561618981
< 0.1%
ValueCountFrequency (%)
-2 × 10-81812
3.1%
-0.998464351
 
< 0.1%
-0.9989161
 
< 0.1%
-0.999012091
 
< 0.1%
-0.999117021
 
< 0.1%
-0.99946921
 
< 0.1%
-0.999506511
 
< 0.1%
-0.999522321
 
< 0.1%
-1.000585191
 
< 0.1%
-1.00152081
 
< 0.1%

wpt_name
Categorical

HIGH CARDINALITY

Distinct37400
Distinct (%)63.0%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
none
 
3563
Shuleni
 
1748
Zahanati
 
830
Msikitini
 
535
Kanisani
 
323
Other values (37395)
52401 

Length

Max length30
Median length10
Mean length10.96210438
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique32928 ?
Unique (%)55.4%

Sample

1st rownone
2nd rowZahanati
3rd rowKwa Mahundi
4th rowZahanati Ya Nanyumbu
5th rowShuleni

Common Values

ValueCountFrequency (%)
none3563
 
6.0%
Shuleni1748
 
2.9%
Zahanati830
 
1.4%
Msikitini535
 
0.9%
Kanisani323
 
0.5%
Bombani271
 
0.5%
Sokoni260
 
0.4%
Ofisini254
 
0.4%
School208
 
0.4%
Shule Ya Msingi199
 
0.3%
Other values (37390)51209
86.2%

Length

2021-11-01T12:26:37.063651image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
kwa21384
 
19.6%
none3565
 
3.3%
mzee3385
 
3.1%
shuleni2123
 
1.9%
ya1499
 
1.4%
shule1389
 
1.3%
school1113
 
1.0%
primary1052
 
1.0%
zahanati983
 
0.9%
msingi870
 
0.8%
Other values (29461)71931
65.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

num_private
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct65
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4741414141
Minimum0
Maximum1776
Zeros58643
Zeros (%)98.7%
Negative0
Negative (%)0.0%
Memory size928.1 KiB
2021-11-01T12:26:37.197167image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum1776
Range1776
Interquartile range (IQR)0

Descriptive statistics

Standard deviation12.23622981
Coefficient of variation (CV)25.80713147
Kurtosis11137.29521
Mean0.4741414141
Median Absolute Deviation (MAD)0
Skewness91.93374999
Sum28164
Variance149.72532
MonotonicityNot monotonic
2021-11-01T12:26:37.318771image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
058643
98.7%
681
 
0.1%
173
 
0.1%
546
 
0.1%
846
 
0.1%
3240
 
0.1%
4536
 
0.1%
1535
 
0.1%
3930
 
0.1%
9328
 
< 0.1%
Other values (55)342
 
0.6%
ValueCountFrequency (%)
058643
98.7%
173
 
0.1%
223
 
< 0.1%
327
 
< 0.1%
420
 
< 0.1%
546
 
0.1%
681
 
0.1%
726
 
< 0.1%
846
 
0.1%
94
 
< 0.1%
ValueCountFrequency (%)
17761
< 0.1%
14021
< 0.1%
7551
< 0.1%
6981
< 0.1%
6721
< 0.1%
6681
< 0.1%
4501
< 0.1%
3001
< 0.1%
2801
< 0.1%
2401
< 0.1%

basin
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
Lake Victoria
10248 
Pangani
8940 
Rufiji
7976 
Internal
7785 
Lake Tanganyika
6432 
Other values (4)
18019 

Length

Max length23
Median length10
Mean length10.8923569
Min length6

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLake Nyasa
2nd rowLake Victoria
3rd rowPangani
4th rowRuvuma / Southern Coast
5th rowLake Victoria

Common Values

ValueCountFrequency (%)
Lake Victoria10248
17.3%
Pangani8940
15.1%
Rufiji7976
13.4%
Internal7785
13.1%
Lake Tanganyika6432
10.8%
Wami / Ruvu5987
10.1%
Lake Nyasa5085
8.6%
Ruvuma / Southern Coast4493
7.6%
Lake Rukwa2454
 
4.1%

Length

2021-11-01T12:26:37.624199image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-01T12:26:37.712538image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
lake24219
22.2%
10480
9.6%
victoria10248
9.4%
pangani8940
 
8.2%
rufiji7976
 
7.3%
internal7785
 
7.1%
tanganyika6432
 
5.9%
ruvu5987
 
5.5%
wami5987
 
5.5%
nyasa5085
 
4.7%
Other values (4)15933
14.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

subvillage
Categorical

HIGH CARDINALITY

Distinct19287
Distinct (%)32.7%
Missing371
Missing (%)0.6%
Memory size928.1 KiB
Madukani
 
508
Shuleni
 
506
Majengo
 
502
Kati
 
373
Mtakuja
 
262
Other values (19282)
56878 

Length

Max length30
Median length7
Mean length7.897592709
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9424 ?
Unique (%)16.0%

Sample

1st rowMnyusi B
2nd rowNyamara
3rd rowMajengo
4th rowMahakamani
5th rowKyanyamisa

Common Values

ValueCountFrequency (%)
Madukani508
 
0.9%
Shuleni506
 
0.9%
Majengo502
 
0.8%
Kati373
 
0.6%
Mtakuja262
 
0.4%
Sokoni232
 
0.4%
M187
 
0.3%
Muungano172
 
0.3%
Mbuyuni164
 
0.3%
Mlimani152
 
0.3%
Other values (19277)55971
94.2%
(Missing)371
 
0.6%

Length

2021-11-01T12:26:37.860851image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
a2387
 
3.4%
b2043
 
2.9%
kati1902
 
2.7%
majengo610
 
0.9%
wa600
 
0.8%
shuleni593
 
0.8%
madukani569
 
0.8%
mtaa514
 
0.7%
juu403
 
0.6%
mjini378
 
0.5%
Other values (17024)60795
85.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

region
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
Iringa
5294 
Shinyanga
4982 
Mbeya
4639 
Kilimanjaro
4379 
Morogoro
4006 
Other values (16)
36100 

Length

Max length13
Median length6
Mean length6.623754209
Min length4

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIringa
2nd rowMara
3rd rowManyara
4th rowMtwara
5th rowKagera

Common Values

ValueCountFrequency (%)
Iringa5294
 
8.9%
Shinyanga4982
 
8.4%
Mbeya4639
 
7.8%
Kilimanjaro4379
 
7.4%
Morogoro4006
 
6.7%
Arusha3350
 
5.6%
Kagera3316
 
5.6%
Mwanza3102
 
5.2%
Kigoma2816
 
4.7%
Ruvuma2640
 
4.4%
Other values (11)20876
35.1%

Length

2021-11-01T12:26:37.969371image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
iringa5294
 
8.7%
shinyanga4982
 
8.2%
mbeya4639
 
7.6%
kilimanjaro4379
 
7.2%
morogoro4006
 
6.6%
arusha3350
 
5.5%
kagera3316
 
5.4%
mwanza3102
 
5.1%
kigoma2816
 
4.6%
ruvuma2640
 
4.3%
Other values (13)22486
36.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

region_code
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct27
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.29700337
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size928.1 KiB
2021-11-01T12:26:38.063964image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q15
median12
Q317
95-th percentile60
Maximum99
Range98
Interquartile range (IQR)12

Descriptive statistics

Standard deviation17.58740634
Coefficient of variation (CV)1.149728866
Kurtosis10.28843341
Mean15.29700337
Median Absolute Deviation (MAD)6
Skewness3.17381811
Sum908642
Variance309.3168617
MonotonicityNot monotonic
2021-11-01T12:26:38.168812image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
115300
 
8.9%
175011
 
8.4%
124639
 
7.8%
34379
 
7.4%
54040
 
6.8%
183324
 
5.6%
193047
 
5.1%
23024
 
5.1%
162816
 
4.7%
102640
 
4.4%
Other values (17)21180
35.7%
ValueCountFrequency (%)
12201
3.7%
23024
5.1%
34379
7.4%
42513
4.2%
54040
6.8%
61609
 
2.7%
7805
 
1.4%
8300
 
0.5%
9390
 
0.7%
102640
4.4%
ValueCountFrequency (%)
99423
 
0.7%
90917
 
1.5%
801238
 
2.1%
601025
 
1.7%
401
 
< 0.1%
24326
 
0.5%
211583
2.7%
201969
3.3%
193047
5.1%
183324
5.6%

district_code
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.629747475
Minimum0
Maximum80
Zeros23
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size928.1 KiB
2021-11-01T12:26:38.272526image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q12
median3
Q35
95-th percentile30
Maximum80
Range80
Interquartile range (IQR)3

Descriptive statistics

Standard deviation9.633648629
Coefficient of variation (CV)1.711204396
Kurtosis16.21428363
Mean5.629747475
Median Absolute Deviation (MAD)1
Skewness3.962045299
Sum334407
Variance92.80718592
MonotonicityNot monotonic
2021-11-01T12:26:38.371414image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
112203
20.5%
211173
18.8%
39998
16.8%
48999
15.1%
54356
 
7.3%
64074
 
6.9%
73343
 
5.6%
81043
 
1.8%
30995
 
1.7%
33874
 
1.5%
Other values (10)2342
 
3.9%
ValueCountFrequency (%)
023
 
< 0.1%
112203
20.5%
211173
18.8%
39998
16.8%
48999
15.1%
54356
 
7.3%
64074
 
6.9%
73343
 
5.6%
81043
 
1.8%
13391
 
0.7%
ValueCountFrequency (%)
8012
 
< 0.1%
676
 
< 0.1%
63195
 
0.3%
62109
 
0.2%
6063
 
0.1%
53745
1.3%
43505
0.9%
33874
1.5%
30995
1.7%
23293
 
0.5%

lga
Categorical

HIGH CARDINALITY

Distinct125
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
Njombe
 
2503
Arusha Rural
 
1252
Moshi Rural
 
1251
Bariadi
 
1177
Rungwe
 
1106
Other values (120)
52111 

Length

Max length16
Median length6
Mean length7.416885522
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowLudewa
2nd rowSerengeti
3rd rowSimanjiro
4th rowNanyumbu
5th rowKaragwe

Common Values

ValueCountFrequency (%)
Njombe2503
 
4.2%
Arusha Rural1252
 
2.1%
Moshi Rural1251
 
2.1%
Bariadi1177
 
2.0%
Rungwe1106
 
1.9%
Kilosa1094
 
1.8%
Kasulu1047
 
1.8%
Mbozi1034
 
1.7%
Meru1009
 
1.7%
Bagamoyo997
 
1.7%
Other values (115)46930
79.0%

Length

2021-11-01T12:26:38.491864image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
rural9552
 
13.5%
njombe2503
 
3.5%
urban1683
 
2.4%
moshi1330
 
1.9%
arusha1315
 
1.9%
bariadi1177
 
1.7%
singida1172
 
1.7%
rungwe1106
 
1.6%
kilosa1094
 
1.5%
kasulu1047
 
1.5%
Other values (106)48656
68.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

ward
Categorical

HIGH CARDINALITY

Distinct2092
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
Igosi
 
307
Imalinyi
 
252
Siha Kati
 
232
Mdandu
 
231
Nduruma
 
217
Other values (2087)
58161 

Length

Max length23
Median length7
Mean length7.505841751
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique30 ?
Unique (%)0.1%

Sample

1st rowMundindi
2nd rowNatta
3rd rowNgorika
4th rowNanyumbu
5th rowNyakasimbi

Common Values

ValueCountFrequency (%)
Igosi307
 
0.5%
Imalinyi252
 
0.4%
Siha Kati232
 
0.4%
Mdandu231
 
0.4%
Nduruma217
 
0.4%
Mishamo203
 
0.3%
Kitunda203
 
0.3%
Msindo201
 
0.3%
Chalinze196
 
0.3%
Maji ya Chai190
 
0.3%
Other values (2082)57168
96.2%

Length

2021-11-01T12:26:38.616194image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mashariki580
 
0.9%
urban540
 
0.8%
siha434
 
0.7%
kusini393
 
0.6%
magharibi362
 
0.6%
igosi307
 
0.5%
masama303
 
0.5%
machame293
 
0.5%
kati270
 
0.4%
imalinyi252
 
0.4%
Other values (2106)61033
94.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

population
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct1049
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean179.9099832
Minimum0
Maximum30500
Zeros21381
Zeros (%)36.0%
Negative0
Negative (%)0.0%
Memory size928.1 KiB
2021-11-01T12:26:38.739623image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median25
Q3215
95-th percentile680
Maximum30500
Range30500
Interquartile range (IQR)215

Descriptive statistics

Standard deviation471.4821757
Coefficient of variation (CV)2.620655994
Kurtosis402.2801153
Mean179.9099832
Median Absolute Deviation (MAD)25
Skewness12.66071359
Sum10686653
Variance222295.442
MonotonicityNot monotonic
2021-11-01T12:26:38.858505image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
021381
36.0%
17025
 
11.8%
2001940
 
3.3%
1501892
 
3.2%
2501681
 
2.8%
3001476
 
2.5%
1001146
 
1.9%
501139
 
1.9%
5001009
 
1.7%
350986
 
1.7%
Other values (1039)19725
33.2%
ValueCountFrequency (%)
021381
36.0%
17025
 
11.8%
24
 
< 0.1%
34
 
< 0.1%
413
 
< 0.1%
544
 
0.1%
619
 
< 0.1%
73
 
< 0.1%
823
 
< 0.1%
911
 
< 0.1%
ValueCountFrequency (%)
305001
 
< 0.1%
153001
 
< 0.1%
114631
 
< 0.1%
100003
< 0.1%
98651
 
< 0.1%
95001
 
< 0.1%
90003
< 0.1%
88481
 
< 0.1%
86001
 
< 0.1%
85001
 
< 0.1%

public_meeting
Boolean

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing3334
Missing (%)5.6%
Memory size928.1 KiB
True
51011 
False
 
5055
(Missing)
 
3334
ValueCountFrequency (%)
True51011
85.9%
False5055
 
8.5%
(Missing)3334
 
5.6%
2021-11-01T12:26:38.946837image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

recorded_by
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
GeoData Consultants Ltd
59400 

Length

Max length23
Median length23
Mean length23
Min length23

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGeoData Consultants Ltd
2nd rowGeoData Consultants Ltd
3rd rowGeoData Consultants Ltd
4th rowGeoData Consultants Ltd
5th rowGeoData Consultants Ltd

Common Values

ValueCountFrequency (%)
GeoData Consultants Ltd59400
100.0%

Length

2021-11-01T12:26:39.018408image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-01T12:26:39.078303image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
ltd59400
33.3%
consultants59400
33.3%
geodata59400
33.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

scheme_management
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct12
Distinct (%)< 0.1%
Missing3877
Missing (%)6.5%
Memory size928.1 KiB
VWC
36793 
WUG
5206 
Water authority
 
3153
WUA
 
2883
Water Board
 
2748
Other values (7)
4740 

Length

Max length16
Median length3
Mean length4.644723808
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowVWC
2nd rowOther
3rd rowVWC
4th rowVWC
5th rowVWC

Common Values

ValueCountFrequency (%)
VWC36793
61.9%
WUG5206
 
8.8%
Water authority3153
 
5.3%
WUA2883
 
4.9%
Water Board2748
 
4.6%
Parastatal1680
 
2.8%
Private operator1063
 
1.8%
Company1061
 
1.8%
Other766
 
1.3%
SWC97
 
0.2%
Other values (2)73
 
0.1%
(Missing)3877
 
6.5%

Length

2021-11-01T12:26:39.146064image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
vwc36793
58.9%
water5901
 
9.4%
wug5206
 
8.3%
authority3153
 
5.0%
wua2883
 
4.6%
board2748
 
4.4%
parastatal1680
 
2.7%
operator1063
 
1.7%
private1063
 
1.7%
company1061
 
1.7%
Other values (4)936
 
1.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

scheme_name
Categorical

HIGH CARDINALITY
MISSING

Distinct2696
Distinct (%)8.6%
Missing28166
Missing (%)47.4%
Memory size928.1 KiB
K
 
682
None
 
644
Borehole
 
546
Chalinze wate
 
405
M
 
400
Other values (2691)
28557 

Length

Max length46
Median length13
Mean length14.30521227
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique712 ?
Unique (%)2.3%

Sample

1st rowRoman
2nd rowNyumba ya mungu pipe scheme
3rd rowZingibali
4th rowBL Bondeni
5th rowNone

Common Values

ValueCountFrequency (%)
K682
 
1.1%
None644
 
1.1%
Borehole546
 
0.9%
Chalinze wate405
 
0.7%
M400
 
0.7%
DANIDA379
 
0.6%
Government320
 
0.5%
Ngana water supplied scheme270
 
0.5%
wanging'ombe water supply s261
 
0.4%
wanging'ombe supply scheme234
 
0.4%
Other values (2686)27093
45.6%
(Missing)28166
47.4%

Length

2021-11-01T12:26:39.264140image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
water9770
 
13.6%
supply6745
 
9.4%
scheme2532
 
3.5%
wa2157
 
3.0%
gravity1914
 
2.7%
pipe1346
 
1.9%
maji1343
 
1.9%
mradi1097
 
1.5%
line1016
 
1.4%
supplied877
 
1.2%
Other values (2506)43219
60.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

permit
Boolean

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing3056
Missing (%)5.1%
Memory size928.1 KiB
True
38852 
False
17492 
(Missing)
 
3056
ValueCountFrequency (%)
True38852
65.4%
False17492
29.4%
(Missing)3056
 
5.1%
2021-11-01T12:26:39.343258image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

construction_year
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct55
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1300.652475
Minimum0
Maximum2013
Zeros20709
Zeros (%)34.9%
Negative0
Negative (%)0.0%
Memory size928.1 KiB
2021-11-01T12:26:39.425753image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1986
Q32004
95-th percentile2010
Maximum2013
Range2013
Interquartile range (IQR)2004

Descriptive statistics

Standard deviation951.6205473
Coefficient of variation (CV)0.7316485885
Kurtosis-1.596432369
Mean1300.652475
Median Absolute Deviation (MAD)22
Skewness-0.6349277866
Sum77258757
Variance905581.6661
MonotonicityNot monotonic
2021-11-01T12:26:39.553558image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
020709
34.9%
20102645
 
4.5%
20082613
 
4.4%
20092533
 
4.3%
20002091
 
3.5%
20071587
 
2.7%
20061471
 
2.5%
20031286
 
2.2%
20111256
 
2.1%
20041123
 
1.9%
Other values (45)22086
37.2%
ValueCountFrequency (%)
020709
34.9%
1960102
 
0.2%
196121
 
< 0.1%
196230
 
0.1%
196385
 
0.1%
196440
 
0.1%
196519
 
< 0.1%
196617
 
< 0.1%
196788
 
0.1%
196877
 
0.1%
ValueCountFrequency (%)
2013176
 
0.3%
20121084
1.8%
20111256
2.1%
20102645
4.5%
20092533
4.3%
20082613
4.4%
20071587
2.7%
20061471
2.5%
20051011
 
1.7%
20041123
1.9%

extraction_type
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
gravity
26780 
nira/tanira
8154 
other
6430 
submersible
4764 
swn 80
3670 
Other values (13)
9602 

Length

Max length25
Median length7
Mean length7.719511785
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgravity
2nd rowgravity
3rd rowgravity
4th rowsubmersible
5th rowgravity

Common Values

ValueCountFrequency (%)
gravity26780
45.1%
nira/tanira8154
 
13.7%
other6430
 
10.8%
submersible4764
 
8.0%
swn 803670
 
6.2%
mono2865
 
4.8%
india mark ii2400
 
4.0%
afridev1770
 
3.0%
ksb1415
 
2.4%
other - rope pump451
 
0.8%
Other values (8)701
 
1.2%

Length

2021-11-01T12:26:39.683516image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gravity26780
38.1%
nira/tanira8154
 
11.6%
other7197
 
10.2%
submersible4764
 
6.8%
swn3899
 
5.5%
803670
 
5.2%
mono2865
 
4.1%
india2498
 
3.6%
mark2498
 
3.6%
ii2400
 
3.4%
Other values (13)5640
 
8.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

extraction_type_group
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
gravity
26780 
nira/tanira
8154 
other
6430 
submersible
6179 
swn 80
3670 
Other values (8)
8187 

Length

Max length15
Median length7
Mean length7.880538721
Min length4

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgravity
2nd rowgravity
3rd rowgravity
4th rowsubmersible
5th rowgravity

Common Values

ValueCountFrequency (%)
gravity26780
45.1%
nira/tanira8154
 
13.7%
other6430
 
10.8%
submersible6179
 
10.4%
swn 803670
 
6.2%
mono2865
 
4.8%
india mark ii2400
 
4.0%
afridev1770
 
3.0%
rope pump451
 
0.8%
other handpump364
 
0.6%
Other values (3)337
 
0.6%

Length

2021-11-01T12:26:39.785688image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gravity26780
38.8%
nira/tanira8154
 
11.8%
other6916
 
10.0%
submersible6179
 
9.0%
swn3670
 
5.3%
803670
 
5.3%
mono2865
 
4.2%
india2498
 
3.6%
mark2498
 
3.6%
ii2400
 
3.5%
Other values (7)3373
 
4.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

extraction_type_class
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
gravity
26780 
handpump
16456 
other
6430 
submersible
6179 
motorpump
2987 
Other values (2)
 
568

Length

Max length12
Median length7
Mean length7.602239057
Min length5

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgravity
2nd rowgravity
3rd rowgravity
4th rowsubmersible
5th rowgravity

Common Values

ValueCountFrequency (%)
gravity26780
45.1%
handpump16456
27.7%
other6430
 
10.8%
submersible6179
 
10.4%
motorpump2987
 
5.0%
rope pump451
 
0.8%
wind-powered117
 
0.2%

Length

2021-11-01T12:26:39.888504image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-01T12:26:39.961966image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
gravity26780
44.7%
handpump16456
27.5%
other6430
 
10.7%
submersible6179
 
10.3%
motorpump2987
 
5.0%
pump451
 
0.8%
rope451
 
0.8%
wind-powered117
 
0.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

management
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
vwc
40507 
wug
6515 
water board
 
2933
wua
 
2535
private operator
 
1971
Other values (7)
4939 

Length

Max length16
Median length3
Mean length4.350639731
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowvwc
2nd rowwug
3rd rowvwc
4th rowvwc
5th rowother

Common Values

ValueCountFrequency (%)
vwc40507
68.2%
wug6515
 
11.0%
water board2933
 
4.9%
wua2535
 
4.3%
private operator1971
 
3.3%
parastatal1768
 
3.0%
water authority904
 
1.5%
other844
 
1.4%
company685
 
1.2%
unknown561
 
0.9%
Other values (2)177
 
0.3%

Length

2021-11-01T12:26:40.076015image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
vwc40507
61.9%
wug6515
 
10.0%
water3837
 
5.9%
board2933
 
4.5%
wua2535
 
3.9%
operator1971
 
3.0%
private1971
 
3.0%
parastatal1768
 
2.7%
other943
 
1.4%
authority904
 
1.4%
Other values (5)1522
 
2.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

management_group
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
user-group
52490 
commercial
 
3638
parastatal
 
1768
other
 
943
unknown
 
561

Length

Max length10
Median length10
Mean length9.892289562
Min length5

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowuser-group
2nd rowuser-group
3rd rowuser-group
4th rowuser-group
5th rowother

Common Values

ValueCountFrequency (%)
user-group52490
88.4%
commercial3638
 
6.1%
parastatal1768
 
3.0%
other943
 
1.6%
unknown561
 
0.9%

Length

2021-11-01T12:26:40.177186image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-01T12:26:40.244091image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
user-group52490
88.4%
commercial3638
 
6.1%
parastatal1768
 
3.0%
other943
 
1.6%
unknown561
 
0.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

payment
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
never pay
25348 
pay per bucket
8985 
pay monthly
8300 
unknown
8157 
pay when scheme fails
3914 
Other values (2)
4696 

Length

Max length21
Median length9
Mean length10.66479798
Min length5

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowpay annually
2nd rownever pay
3rd rowpay per bucket
4th rownever pay
5th rownever pay

Common Values

ValueCountFrequency (%)
never pay25348
42.7%
pay per bucket8985
 
15.1%
pay monthly8300
 
14.0%
unknown8157
 
13.7%
pay when scheme fails3914
 
6.6%
pay annually3642
 
6.1%
other1054
 
1.8%

Length

2021-11-01T12:26:40.345095image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-01T12:26:40.596502image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
pay50189
39.7%
never25348
20.1%
bucket8985
 
7.1%
per8985
 
7.1%
monthly8300
 
6.6%
unknown8157
 
6.5%
fails3914
 
3.1%
scheme3914
 
3.1%
when3914
 
3.1%
annually3642
 
2.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

payment_type
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
never pay
25348 
per bucket
8985 
monthly
8300 
unknown
8157 
on failure
3914 
Other values (2)
4696 

Length

Max length10
Median length9
Mean length8.530757576
Min length5

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowannually
2nd rownever pay
3rd rowper bucket
4th rownever pay
5th rownever pay

Common Values

ValueCountFrequency (%)
never pay25348
42.7%
per bucket8985
 
15.1%
monthly8300
 
14.0%
unknown8157
 
13.7%
on failure3914
 
6.6%
annually3642
 
6.1%
other1054
 
1.8%

Length

2021-11-01T12:26:40.713738image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-01T12:26:40.786791image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
pay25348
26.0%
never25348
26.0%
bucket8985
 
9.2%
per8985
 
9.2%
monthly8300
 
8.5%
unknown8157
 
8.4%
failure3914
 
4.0%
on3914
 
4.0%
annually3642
 
3.7%
other1054
 
1.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

water_quality
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
soft
50818 
salty
 
4856
unknown
 
1876
milky
 
804
coloured
 
490
Other values (3)
 
556

Length

Max length18
Median length4
Mean length4.303282828
Min length4

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowsoft
2nd rowsoft
3rd rowsoft
4th rowsoft
5th rowsoft

Common Values

ValueCountFrequency (%)
soft50818
85.6%
salty4856
 
8.2%
unknown1876
 
3.2%
milky804
 
1.4%
coloured490
 
0.8%
salty abandoned339
 
0.6%
fluoride200
 
0.3%
fluoride abandoned17
 
< 0.1%

Length

2021-11-01T12:26:40.901034image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-01T12:26:40.974953image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
soft50818
85.0%
salty5195
 
8.7%
unknown1876
 
3.1%
milky804
 
1.3%
coloured490
 
0.8%
abandoned356
 
0.6%
fluoride217
 
0.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

quality_group
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
good
50818 
salty
5195 
unknown
 
1876
milky
 
804
colored
 
490

Length

Max length8
Median length4
Mean length4.23510101
Min length4

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgood
2nd rowgood
3rd rowgood
4th rowgood
5th rowgood

Common Values

ValueCountFrequency (%)
good50818
85.6%
salty5195
 
8.7%
unknown1876
 
3.2%
milky804
 
1.4%
colored490
 
0.8%
fluoride217
 
0.4%

Length

2021-11-01T12:26:41.098031image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-01T12:26:41.173290image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
good50818
85.6%
salty5195
 
8.7%
unknown1876
 
3.2%
milky804
 
1.4%
colored490
 
0.8%
fluoride217
 
0.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

quantity
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
enough
33186 
insufficient
15129 
dry
6246 
seasonal
4050 
unknown
 
789

Length

Max length12
Median length6
Mean length7.362373737
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowenough
2nd rowinsufficient
3rd rowenough
4th rowdry
5th rowseasonal

Common Values

ValueCountFrequency (%)
enough33186
55.9%
insufficient15129
25.5%
dry6246
 
10.5%
seasonal4050
 
6.8%
unknown789
 
1.3%

Length

2021-11-01T12:26:41.272864image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-01T12:26:41.338162image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
enough33186
55.9%
insufficient15129
25.5%
dry6246
 
10.5%
seasonal4050
 
6.8%
unknown789
 
1.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

quantity_group
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
enough
33186 
insufficient
15129 
dry
6246 
seasonal
4050 
unknown
 
789

Length

Max length12
Median length6
Mean length7.362373737
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowenough
2nd rowinsufficient
3rd rowenough
4th rowdry
5th rowseasonal

Common Values

ValueCountFrequency (%)
enough33186
55.9%
insufficient15129
25.5%
dry6246
 
10.5%
seasonal4050
 
6.8%
unknown789
 
1.3%

Length

2021-11-01T12:26:41.431980image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-01T12:26:41.496963image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
enough33186
55.9%
insufficient15129
25.5%
dry6246
 
10.5%
seasonal4050
 
6.8%
unknown789
 
1.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

source
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
spring
17021 
shallow well
16824 
machine dbh
11075 
river
9612 
rainwater harvesting
2295 
Other values (5)
2573 

Length

Max length20
Median length11
Mean length8.978804714
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowspring
2nd rowrainwater harvesting
3rd rowdam
4th rowmachine dbh
5th rowrainwater harvesting

Common Values

ValueCountFrequency (%)
spring17021
28.7%
shallow well16824
28.3%
machine dbh11075
18.6%
river9612
16.2%
rainwater harvesting2295
 
3.9%
hand dtw874
 
1.5%
lake765
 
1.3%
dam656
 
1.1%
other212
 
0.4%
unknown66
 
0.1%

Length

2021-11-01T12:26:41.598109image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-01T12:26:41.676996image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
spring17021
18.8%
well16824
18.6%
shallow16824
18.6%
dbh11075
12.2%
machine11075
12.2%
river9612
10.6%
harvesting2295
 
2.5%
rainwater2295
 
2.5%
dtw874
 
1.0%
hand874
 
1.0%
Other values (4)1699
 
1.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

source_type
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
spring
17021 
shallow well
16824 
borehole
11949 
river/lake
10377 
rainwater harvesting
2295 
Other values (2)
 
934

Length

Max length20
Median length8
Mean length9.303602694
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowspring
2nd rowrainwater harvesting
3rd rowdam
4th rowborehole
5th rowrainwater harvesting

Common Values

ValueCountFrequency (%)
spring17021
28.7%
shallow well16824
28.3%
borehole11949
20.1%
river/lake10377
17.5%
rainwater harvesting2295
 
3.9%
dam656
 
1.1%
other278
 
0.5%

Length

2021-11-01T12:26:41.813444image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-01T12:26:41.886699image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
spring17021
21.7%
well16824
21.4%
shallow16824
21.4%
borehole11949
15.2%
river/lake10377
13.2%
harvesting2295
 
2.9%
rainwater2295
 
2.9%
dam656
 
0.8%
other278
 
0.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

source_class
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
groundwater
45794 
surface
13328 
unknown
 
278

Length

Max length11
Median length11
Mean length10.08377104
Min length7

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgroundwater
2nd rowsurface
3rd rowsurface
4th rowgroundwater
5th rowsurface

Common Values

ValueCountFrequency (%)
groundwater45794
77.1%
surface13328
 
22.4%
unknown278
 
0.5%

Length

2021-11-01T12:26:42.000989image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-01T12:26:42.070731image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
groundwater45794
77.1%
surface13328
 
22.4%
unknown278
 
0.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

waterpoint_type
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
communal standpipe
28522 
hand pump
17488 
other
6380 
communal standpipe multiple
6103 
improved spring
 
784
Other values (2)
 
123

Length

Max length27
Median length18
Mean length14.82757576
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowcommunal standpipe
2nd rowcommunal standpipe
3rd rowcommunal standpipe multiple
4th rowcommunal standpipe multiple
5th rowcommunal standpipe

Common Values

ValueCountFrequency (%)
communal standpipe28522
48.0%
hand pump17488
29.4%
other6380
 
10.7%
communal standpipe multiple6103
 
10.3%
improved spring784
 
1.3%
cattle trough116
 
0.2%
dam7
 
< 0.1%

Length

2021-11-01T12:26:42.149154image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-01T12:26:42.218054image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
standpipe34625
29.2%
communal34625
29.2%
pump17488
14.8%
hand17488
14.8%
other6380
 
5.4%
multiple6103
 
5.1%
spring784
 
0.7%
improved784
 
0.7%
trough116
 
0.1%
cattle116
 
0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

waterpoint_type_group
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
communal standpipe
34625 
hand pump
17488 
other
6380 
improved spring
 
784
cattle trough
 
116

Length

Max length18
Median length18
Mean length13.90287879
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowcommunal standpipe
2nd rowcommunal standpipe
3rd rowcommunal standpipe
4th rowcommunal standpipe
5th rowcommunal standpipe

Common Values

ValueCountFrequency (%)
communal standpipe34625
58.3%
hand pump17488
29.4%
other6380
 
10.7%
improved spring784
 
1.3%
cattle trough116
 
0.2%
dam7
 
< 0.1%

Length

2021-11-01T12:26:42.337276image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-01T12:26:42.412166image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
standpipe34625
30.8%
communal34625
30.8%
pump17488
15.6%
hand17488
15.6%
other6380
 
5.7%
spring784
 
0.7%
improved784
 
0.7%
trough116
 
0.1%
cattle116
 
0.1%
dam7
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

status_group
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
functional
32259 
non functional
22824 
functional needs repair
4317 

Length

Max length23
Median length10
Mean length12.48176768
Min length10

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfunctional
2nd rowfunctional
3rd rowfunctional
4th rownon functional
5th rowfunctional

Common Values

ValueCountFrequency (%)
functional32259
54.3%
non functional22824
38.4%
functional needs repair4317
 
7.3%

Length

2021-11-01T12:26:42.515384image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-01T12:26:42.580595image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
functional59400
65.4%
non22824
 
25.1%
repair4317
 
4.8%
needs4317
 
4.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

near_river
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
1.0
35481 
0.0
23919 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row1.0
3rd row1.0
4th row0.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.035481
59.7%
0.023919
40.3%

Length

2021-11-01T12:26:42.659234image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-01T12:26:42.720759image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
1.035481
59.7%
0.023919
40.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Interactions

2021-11-01T12:26:30.365639image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:17.974916image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:19.484501image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:20.859800image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:22.322756image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:23.674526image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:24.981932image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:26.308468image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:27.746892image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:29.081718image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:30.502229image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:18.161588image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:19.623410image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:20.988462image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:22.457563image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:23.810236image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:25.115374image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:26.435085image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:27.881004image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:29.210760image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:30.640435image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:18.322542image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:19.763422image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:21.121303image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:22.593527image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:23.945542image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:25.249230image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:26.565743image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:28.015500image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:29.342097image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:30.767006image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:18.475155image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:19.893707image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:21.241661image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:22.720660image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:24.068856image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:25.375676image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:26.685715image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:28.141828image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:29.467203image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:30.904605image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:18.625883image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:20.035958image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:21.374657image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:22.859724image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:24.202628image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:25.512787image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:26.817795image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:28.279106image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:29.599680image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:31.035578image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:18.769044image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:20.171422image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:21.498268image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:22.991268image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:24.329678image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:25.641886image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:26.941023image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:28.409635image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:29.726726image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:31.169338image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:18.907630image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:20.310574image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:21.626936image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:23.128038image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:24.459689image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:25.776677image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:27.244161image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:28.544268image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:29.855479image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:31.296032image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:19.042730image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:20.441381image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:21.747212image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:23.256784image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:24.582932image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:25.901360image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:27.363394image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:28.670621image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:29.975694image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:31.433401image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:19.186733image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:20.582822image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:22.063748image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:23.396215image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:24.719510image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:26.038304image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:27.493987image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:28.809032image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:30.107507image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:31.564924image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:19.338274image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:20.716927image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:22.188005image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:23.529838image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:24.846378image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:26.168844image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:27.615922image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:28.943096image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-01T12:26:30.232814image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-11-01T12:26:42.791824image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-11-01T12:26:42.973386image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-11-01T12:26:43.152084image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-11-01T12:26:43.358164image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2021-11-01T12:26:43.832411image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-11-01T12:26:31.956780image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-11-01T12:26:33.739733image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-11-01T12:26:34.576387image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-11-01T12:26:34.975252image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

idamount_tshdate_recordedfundergps_heightinstallerlongitudelatitudewpt_namenum_privatebasinsubvillageregionregion_codedistrict_codelgawardpopulationpublic_meetingrecorded_byscheme_managementscheme_namepermitconstruction_yearextraction_typeextraction_type_groupextraction_type_classmanagementmanagement_grouppaymentpayment_typewater_qualityquality_groupquantityquantity_groupsourcesource_typesource_classwaterpoint_typewaterpoint_type_groupstatus_groupnear_river
0695726000.02011-03-14Roman1390Roman34.938093-9.856322none0Lake NyasaMnyusi BIringa115LudewaMundindi109TrueGeoData Consultants LtdVWCRomanFalse1999gravitygravitygravityvwcuser-grouppay annuallyannuallysoftgoodenoughenoughspringspringgroundwatercommunal standpipecommunal standpipefunctional0.0
187760.02013-03-06Grumeti1399GRUMETI34.698766-2.147466Zahanati0Lake VictoriaNyamaraMara202SerengetiNatta280NaNGeoData Consultants LtdOtherNaNTrue2010gravitygravitygravitywuguser-groupnever paynever paysoftgoodinsufficientinsufficientrainwater harvestingrainwater harvestingsurfacecommunal standpipecommunal standpipefunctional1.0
23431025.02013-02-25Lottery Club686World vision37.460664-3.821329Kwa Mahundi0PanganiMajengoManyara214SimanjiroNgorika250TrueGeoData Consultants LtdVWCNyumba ya mungu pipe schemeTrue2009gravitygravitygravityvwcuser-grouppay per bucketper bucketsoftgoodenoughenoughdamdamsurfacecommunal standpipe multiplecommunal standpipefunctional1.0
3677430.02013-01-28Unicef263UNICEF38.486161-11.155298Zahanati Ya Nanyumbu0Ruvuma / Southern CoastMahakamaniMtwara9063NanyumbuNanyumbu58TrueGeoData Consultants LtdVWCNaNTrue1986submersiblesubmersiblesubmersiblevwcuser-groupnever paynever paysoftgooddrydrymachine dbhboreholegroundwatercommunal standpipe multiplecommunal standpipenon functional0.0
4197280.02011-07-13Action In A0Artisan31.130847-1.825359Shuleni0Lake VictoriaKyanyamisaKagera181KaragweNyakasimbi0TrueGeoData Consultants LtdNaNNaNTrue0gravitygravitygravityotherothernever paynever paysoftgoodseasonalseasonalrainwater harvestingrainwater harvestingsurfacecommunal standpipecommunal standpipefunctional1.0
5994420.02011-03-13Mkinga Distric Coun0DWE39.172796-4.765587Tajiri0PanganiMoa/MweremeTanga48MkingaMoa1TrueGeoData Consultants LtdVWCZingibaliTrue2009submersiblesubmersiblesubmersiblevwcuser-grouppay per bucketper bucketsaltysaltyenoughenoughotherotherunknowncommunal standpipe multiplecommunal standpipefunctional1.0
6198160.02012-10-01Dwsp0DWSP33.362410-3.766365Kwa Ngomho0InternalIshinabulandiShinyanga173Shinyanga RuralSamuye0TrueGeoData Consultants LtdVWCNaNTrue0swn 80swn 80handpumpvwcuser-groupnever paynever paysoftgoodenoughenoughmachine dbhboreholegroundwaterhand pumphand pumpnon functional1.0
7545510.02012-10-09Rwssp0DWE32.620617-4.226198Tushirikiane0Lake TanganyikaNyawishi CenterShinyanga173KahamaChambo0TrueGeoData Consultants LtdNaNNaNTrue0nira/taniranira/tanirahandpumpwuguser-groupunknownunknownmilkymilkyenoughenoughshallow wellshallow wellgroundwaterhand pumphand pumpnon functional0.0
8539340.02012-11-03Wateraid0Water Aid32.711100-5.146712Kwa Ramadhan Musa0Lake TanganyikaImalaudukiTabora146Tabora UrbanItetemia0TrueGeoData Consultants LtdVWCNaNTrue0india mark iiindia mark iihandpumpvwcuser-groupnever paynever paysaltysaltyseasonalseasonalmachine dbhboreholegroundwaterhand pumphand pumpnon functional0.0
9461440.02011-08-03Isingiro Ho0Artisan30.626991-1.257051Kwapeto0Lake VictoriaMkonomreKagera181KaragweKaisho0TrueGeoData Consultants LtdNaNNaNTrue0nira/taniranira/tanirahandpumpvwcuser-groupnever paynever paysoftgoodenoughenoughshallow wellshallow wellgroundwaterhand pumphand pumpfunctional0.0

Last rows

idamount_tshdate_recordedfundergps_heightinstallerlongitudelatitudewpt_namenum_privatebasinsubvillageregionregion_codedistrict_codelgawardpopulationpublic_meetingrecorded_byscheme_managementscheme_namepermitconstruction_yearextraction_typeextraction_type_groupextraction_type_classmanagementmanagement_grouppaymentpayment_typewater_qualityquality_groupquantityquantity_groupsourcesource_typesource_classwaterpoint_typewaterpoint_type_groupstatus_groupnear_river
59390136770.02011-08-04Rudep1715DWE31.370848-8.258160Kwa Mzee Atanas0Lake TanganyikaKitontoRukwa152Sumbawanga RuralMkowe150TrueGeoData Consultants LtdVWCNaNFalse1991swn 80swn 80handpumpvwcuser-groupnever paynever paysoftgoodinsufficientinsufficientmachine dbhboreholegroundwaterhand pumphand pumpfunctional1.0
59391448850.02013-08-03Government Of Tanzania540Government38.044070-4.272218Kwa0PanganiMaore KatiKilimanjaro33SameMaore210TrueGeoData Consultants LtdWater authorityHingililiTrue1967gravitygravitygravityvwcuser-groupnever paynever paysoftgoodenoughenoughriverriver/lakesurfacecommunal standpipecommunal standpipenon functional1.0
59392406070.02011-04-15Government Of Tanzania0Government33.009440-8.520888Benard Charles0Lake RukwaMbuyuni AMbeya121ChunyaMbuyuni0TrueGeoData Consultants LtdVWCNaNTrue0gravitygravitygravityvwcuser-groupnever paynever paysoftgoodenoughenoughspringspringgroundwatercommunal standpipecommunal standpipenon functional0.0
59393483480.02012-10-27Private0Private33.866852-4.287410Kwa Peter0InternalMasangaTabora142IgungaIgunga0FalseGeoData Consultants LtdWater authorityNaNFalse0gravitygravitygravityprivate operatorcommercialpay per bucketper bucketsoftgoodinsufficientinsufficientdamdamsurfaceotherotherfunctional0.0
5939411164500.02011-03-09World Bank351ML appro37.634053-6.124830Chimeredya0Wami / RuvuKomstariMorogoro56MvomeroDiongoya89TrueGeoData Consultants LtdVWCNaNTrue2007submersiblesubmersiblesubmersiblevwcuser-grouppay monthlymonthlysoftgoodenoughenoughmachine dbhboreholegroundwatercommunal standpipecommunal standpipenon functional0.0
593956073910.02013-05-03Germany Republi1210CES37.169807-3.253847Area Three Namba 270PanganiKiduruniKilimanjaro35HaiMasama Magharibi125TrueGeoData Consultants LtdWater BoardLosaa Kia water supplyTrue1999gravitygravitygravitywater boarduser-grouppay per bucketper bucketsoftgoodenoughenoughspringspringgroundwatercommunal standpipecommunal standpipefunctional0.0
59396272634700.02011-05-07Cefa-njombe1212Cefa35.249991-9.070629Kwa Yahona Kuvala0RufijiIgumbiloIringa114NjombeIkondo56TrueGeoData Consultants LtdVWCIkondo electrical water schTrue1996gravitygravitygravityvwcuser-grouppay annuallyannuallysoftgoodenoughenoughriverriver/lakesurfacecommunal standpipecommunal standpipefunctional1.0
59397370570.02011-04-11NaN0NaN34.017087-8.750434Mashine0RufijiMadunguluMbeya127MbaraliChimala0TrueGeoData Consultants LtdVWCNaNFalse0swn 80swn 80handpumpvwcuser-grouppay monthlymonthlyfluoridefluorideenoughenoughmachine dbhboreholegroundwaterhand pumphand pumpfunctional1.0
59398312820.02011-03-08Malec0Musa35.861315-6.378573Mshoro0RufijiMwinyiDodoma14ChamwinoMvumi Makulu0TrueGeoData Consultants LtdVWCNaNTrue0nira/taniranira/tanirahandpumpvwcuser-groupnever paynever paysoftgoodinsufficientinsufficientshallow wellshallow wellgroundwaterhand pumphand pumpfunctional0.0
59399263480.02011-03-23World Bank191World38.104048-6.747464Kwa Mzee Lugawa0Wami / RuvuKikatanyembaMorogoro52Morogoro RuralNgerengere150TrueGeoData Consultants LtdVWCNaNTrue2002nira/taniranira/tanirahandpumpvwcuser-grouppay when scheme failson failuresaltysaltyenoughenoughshallow wellshallow wellgroundwaterhand pumphand pumpfunctional1.0